Optimization Framework for Map Reduce Clusters on Hadoop’s Configuration

نویسندگان

  • Trupti Mali
  • Deepti Varshney
چکیده

ARTICLE INFO Hadoop represents a Java-based distributed computing framework that is designed to support applications that are implemented via the MapReduce programming model. Hadoop performance however is significantly affected by the settings of the Hadoop configuration parameters. Unfortunately, manually tuning these parameters is very time-consuming. Existing system uses Random forest approach, which automatically tune the Hadoop configuration parameters for optimized performance for a given application running on a given cluster. Random forest approach try every combination of configuration parameter values and choose the best one. Unfortunately, this is unrealistic because of the huge number of Hadoop configuration parameter combinations. This takes a considerable amount of time, leading to impractically long times. In the proposed system we consider the constraints in the resource allocation process in the MapReduce programming model for large-scale data processing for speed up performance. For that we proposed the novel technique called Dynamic approach for performing speed up of the available resources. It contains the two major operations; they are slot utilization optimization and utilization efficiency optimization. The Dynamic technique has the three slot allocation techniques they are Dynamic Hadoop Slot Allocation (DHSA), Speculative Execution Performance Balancing (SEPB), and Slot Prescheduling. It achieves a performance speedup by a factor of over the recently proposed cost-based optimization (CBO) approach. In addition performance benefit increases with input data set size.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Map Reduce Performance in Heterogeneous Distributed System using HDFS Environment-A Review

Hadoop is a Java-based programming framework which supports for storing and processing big data in a distributed computing environment. It is using HDFS for data storing and using Map Reduce to processing that data. Map Reduce has become an important distributed processing model for large-scale data-intensive applications like data mining and web indexing. Map Reduce is widely used for short jo...

متن کامل

Master’s Thesis: A Tuning Approach Based on Evolutionary Algorithm and Data Sampling for Boosting Performance of MapReduce Programs

The Apache Hadoop data processing software is immersed in a complex environment composed of huge machine clusters, large data sets, and several processing jobs. Managing a Hadoop environment is time consuming, toilsome and requires expert users. Thus, lack of knowledge may entail misconfigurations degrading the cluster performance. Indeed, users spend a lot of time tuning the system instead of ...

متن کامل

Dynamic configuration and collaborative scheduling in supply chains based on scalable multi-agent architecture

Due to diversified and frequently changing demands from customers, technological advances and global competition, manufacturers rely on collaboration with their business partners to share costs, risks and expertise. How to take advantage of advancement of technologies to effectively support operations and create competitive advantage is critical for manufacturers to survive. To respond to these...

متن کامل

Enhancing Map-Reduce Framework for Bigdata with Hierarchical Clustering

MapReduce is a software framework that allows certain kinds of parallelizable or distributable problems involving large data sets to be solved using computing clusters. This paper introduces our experience of grouping internet users by mining a huge volume of web access log of up to 500 gigabytes. The application is realized using hierarchical clustering algorithms with Map-Reduce, a parallel p...

متن کامل

Reconfiguration of distribution systems to improve reliability and reduce power losses using Imperialist Competitive Algorithm

Distribution systems can be operated in multiple configurations since they are possible combinations of radial and loop feeders. Each configuration leads to its own power losses and reliability level of supplying electric energy to customers. In order to obtain the optimal configuration of power networks, their reconfiguration is formulated as a complex optimization problem with different objec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016